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Abstract 

In this paper, we propose a simple procedure to construct (decodable) good codes with any given 
alphabet (of moderate size) for any given (rational) code rate to achieve any given target error perfor¬ 
mance (of interest) over additive white Gaussian noise (AWGN) channels. We start with constructing 
codes over groups for any given code rates. This can be done in an extremely simple way if we ignore 
the error performance requirement for the time being. Actually, this can be satisfied by repetition (R) 
codes and uncoded (UN) transmission along with time-sharing technique. The resulting codes are simply 
referred to as RUN codes for convenience. The encoding/decoding algorithms for RUN codes are almost 
trivial. In addition, the performance can be easily analyzed. It is not difficult to imagine that a RUN 
code usually performs far away from the corresponding Shannon limit. Fortunately, the performance 
can be improved as required by spatially coupling the RUN codes via block Markov superposition 
transmission (BMST), resulting in the BMST-RUN codes. Simulation results show that the BMST-RUN 
codes perform well (within one dB away from Shannon limits) for a wide range of code rates and 
outperform the BMST with bit-interleaved coded modulation (BMST-BICM) scheme. 
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Block Markov superposition transmission (BMST), codes over groups, spatial coupling, time¬ 
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I. Introduction 

Since the invention of turbo codes [1] and the rediscovery of low-density parity-check (LDPC) 
codes [2], many turbo/LDPC-like codes have been proposed in the past two decades. Among 
them, the convolutional LDPC codes [3], recast as spatially coupled LDPC (SC-LDPC) codes 
in [4], exhibit a threshold saturation phenomenon and were proved to have better performance 
than their block counterparts. In a certain sense, the terminology “spatial coupling” is more gen¬ 
eral, as can be interpreted as making connections among independent subgraphs, or equivalently, 
as introducing memory among successive independent transmissions. With this interpretation, 
braided block codes [5] and staircase codes [6], as the convolutional versions of (generalized) 
product codes, can be classified as spatially coupled codes. In [7], the spatially coupled version 
of turbo codes was proposed, whose belief propagation (BP) threshold is also better than that 
of the uncoupled ensemble. 

Recently, block Markov superposition transmission (BMST) [8-10] was proposed, which can 
also be viewed as the spatial coupling of generator matrices of short codes. The original BMST 
codes are defined over the binary field F 2 . In [9], it has been pointed out that any code with 
fast encoding algorithms and soft-in soft-out (SISO) decoding algorithms can be taken as the 
basic code. For example, one can take the Hadamard transform (HT) coset codes as the basic 
codes, resulting in a class of multiple-rate codes with rates ranging from 1/2 P to (2 P — 1)/2 P , 
where p is a positive integer [11,12]. Even more flexibly, one can use the repetition and/or 
single-parity-check (RSPC) codes as the basic codes to construct a class of multiple-rate codes 
with rates ranging from 1/N to (N — 1)/N, where N > 1 is an integer [13]. It has been 
verified by simulation that the construction approach is applicable not only to binary phase-shift 
keying (BPSK) modulation but also to bit-interleaved coded modulation (BICM) [14], spatial 
modulation [15], continuous phase modulation (CPM) [16], and intensity modulation in visible 
light communications (VLC) [17]. 

In this paper, we propose a procedure to construct codes over groups, which extends the 
construction of BMST-RSPC codes [13] in the following two aspects. First, we allow uncoded 
symbols occurring in the basic codes. Hence the encoding/decoding algorithms for the basic codes 
become simpler. Second, we derive a performance union bound for the repetition codes with 
any given signal mapping, which is critical for designing good BMST codes without invoking 
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simulations. We will not argue that the BMST construction can always deliver better codes than 
other existing constructions. 1 Rather, we argue that the proposed one is more flexible in the sense 
that it applies to any given signal set (of moderate size), any given (rational) code rate and any 
target error performance (of interest). We start with constructing group codes, referred to as RUN 
codes, with any given rate by time-sharing between repetition (R) codes and/or uncoded (UN) 
transmission. By transmitting the RUN codes in the BMST manner, we can have a class of good 
codes (called BMST-RUN codes). The performance of a BMST-RUN code is closely related to 
the encoding memory and can be predicted analytically in the high signal-to-noise ratio (SNR) 
region with the aid of the readily-derived union bound. Simulation results show that the BMST- 
RUN codes can approach the Shannon limits at any given target error rate (of interest) in a wide 
range of code rates over both additive white Gaussian noise (AWGN) channels and Rayleigh 
flat fading channels. 

The pragmatic reader may question the necessity to construct codes over high-order signal 
constellations, since bandwidth efficiency can also be attained by BICM with binary codes. 
However, in addition to the flexility of the construction, the BMST-RUN codes have the following 
competitive advantages. 

• BMST-RUN codes can be easily designed to obtain shaping gain in at least two ways. One is 
designing codes directly over a well-shaped signal constellation, say, non-uniformly spaced 
constellation [18]. The other is implementing Gallager mapping for conventional signal 
constellations [19]. In both cases, neither optimization for bit-mapping (at the transmitter) 
nor iterations between decoding and demapping (at the receiver) are required. 

• BMST-RUN codes can be defined over signal sets of any size, such as 3-ary pulse amplitude 
modulation (3-PAM) and 5-PAM, which can be useful to transmit real samples directly [20]. 

The rest of this paper is organized as follows. In Section II, we take a brief review of the 
BMST technique. In Section III, we discuss constructing group codes with any given signal set 
and any given code rate. In Section IV, we propose the construction method of BMST-RUN 
codes and discuss the performance lower bound. In Section V, we give simulation results and 
make a performance comparison between the BMST-RUN codes and the BMST-BICM scheme. 
In Section VI, we conclude this paper. 

1 Actually, compared with SC-LDPC codes, the BMST codes usually have a higher error floor. However, the existence of the 
high error floor is not a big issue since it can be lowered if necessary by increasing the encoding memory. 
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II. Review of Binary BMST Codes 


Binary BMST codes are convolutional codes with large constraint lengths [8,9]. Typically, a 
binary BMST code of memory m consists of a short code (called the basic code ) and at most 
7n +1 interleavers [10]. Let C[n, k] be the basic code defined by a k x n generator matrix G over 
the binary field F 2 . Denote u^, u^\ ■ ■ ■ , u ( - L ~ 1 ' 1 as L blocks of data to be transmitted, where 
u d) (= F* for 0 < t < L — 1. Then, the encoding output c® G F 2 at time t can be expressed 
as [10] 


c« = u {t) Gn 0 + u (t_1) Gi7i + • • • + u {t ~ m) Gn m , 


( 1 ) 


where u® is initialized to be 0 G for t < 0 and J7 0 , • • • , iT m are m + 1 permutation matrices 
of order n. For L <t < L + m — 1, the zero message sequence u u> = 0 G F| is input into the 
encoder for termination. Then, c® is mapped to a signal vector s <J/> and transmitted over the 
channel, resulting in a received vector y^'k 

At the receiver, the decoder executes the sliding-window decoding (SWD) algorithm to recover 
the transmitted data vS°\ • • • , u tL ~ l} [8,9]. Specifically, for an SWD algorithm with a decoding 
delay d, the decoder takes y^\ • • • , y( t+d ) as inputs to recover u® at time t + d, which is similar 
to the window decoding (WD) of the SC-LDPC codes [21-23]. The structure of the BMST 
codes also admits a two-phase decoding (TPD) algorithm [10], which can be used to reduce the 
decoding delay and to predict the performance in the extremely low bit-error-rate (BER) region. 

As discussed in [9], binary BMST codes have the following two attractive features. 

1) Any code (linear or nonlinear) can be the basic code as long as it has fast encoding 
algorithms and SISO decoding algorithms. 

2) Binary BMST codes have a simple genie-aided lower bound when transmitted over AWGN 
channels using BPSK modulation, which shows that the maximum extra coding gain can 
approach 101og 10 (m + 1) dB compared with the basic code. The tightness of this simple 
lower bound in the high SNR region under the SWD algorithm has been verified by both 
the simulation and the extrinsic information transfer (EXIT) chart analysis [24]. 

Based on the above two facts, a general procedure has been proposed for constructing capacity- 
approaching codes at any given target error rate [10]. Suppose that we want to construct a binary 
BMST code of rate R at a target BER of target- First, we find a rate-A short code C as the 
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basic code. Then, we can determine the encoding memory m by 


m = 


Ttarget - 7lim 

10-io- - 1 


( 2 ) 


where ytarget is the minimum SNR for the code C to achieve the BER ^target, 7iim is the Shannon 
limit corresponding to the rate R, and \x~\ stands for the minimum integer greater than or equal to 
x. Finally, by generating m +1 interleavers uniformly at random, the BMST code is constructed. 
With this method, we have constructed a binary BMST code of memory 30 using the Cartesian 
product of the R code [2, l] 5000 , which has a predicted BER lower than 10 -15 within one dB 
away from the Shannon limit. 


III. RUN Codes over Groups 
A. System Model and Notations 

Consider a symbol set Af = {0,1,•••,<? — 1} and an ^-dimensional signal constellation 
A C M 1 of size q. The symbol set A4 can be treated as a group by defining the operation 
u © w = (u + w) mod q for u, w G M . Let <p be a (fixed) one-to-one mapping <p : M —>• A. 
Let u G M. be a symbol to be transmitted. For the convenience of performance analysis, instead 
of transmitting p(u) directly, we transmit a signal s = <p{u® w), where w is a sample of a 
uniformly distributed random variable over M. and assumed to be known at the receiver. The 
received signal y = s + z, where + denotes the component-wise addition over and z is 
an /-dimensional sample from a zero-mean white Gaussian noise process with variance cr 2 per 
dimension. The SNR is defined as 

SNR = (3) 

where ||s|| 2 is the squared Euclidean norm of s. 

In this paper, for a discrete random variable V over a finite set V, we denote its a priori message 
and extrinsic message as p;.(v),v G V and Py{v),v G V, respectively. A SISO decoding is a 
process that takes a priori messages as inputs and delivers extrinsic messages as outputs. We 
assume that the information messages are independent and uniformly distributed (i.u.d.) over 
M. 


B. Repetition (R) Codes 
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Fig. 1. A message u is encoded into v = (u, ■ ■ ■ ,u) and transmitted over AWGN channels. 


Fig. 1 shows the transmission of a message u for N times over AWGN channels. 

1) Encoding: The encoder of an R code C[A r , 1] over A4 takes as input a single symbol 
u E M and delivers as output an iV-dimensional vector v = (■ v 0 , • • • , vn- i) = («,•••, u ). 

2) Mapping: The j-th component Vj of the codeword v is mapped to the signal s 3 = tp(vj®Wj) 
for j — 0, • • • , N — 1, where w = (w 0 , • • • , vj n _i) is a random vector sampled from an i.u.d. 
process over M. 

3) Demapping: Let y = (y 0 , ■ ■ ■ , ) be the received signal vector corresponding to the 

codeword v. The a priori messages input to the decoder are computed as 


Py (v) oc exp 


hi ~ <p{v®Wj 

2o 2 


,v e M 


( 4 ) 


for j = 0, • • • ,N — 1. 

4) Decoding: The SISO decoding algorithm computes the a posteriori messages 


Pu(u) cx n Py e (u),u G M (5) 

o<e<N-i 

for making decisions and the extrinsic messages 


Pv 3 (v) cx n Pl t (v),veM (6) 

o<e<N-i,e^j 

for j = 0, ■ • • , N — 1 for iteratively decoding when coupled with other sub-systems. 

5) Complexity: Both the encoding/mapping and the demapping/decoding have linear compu¬ 
tational complexity per coded symbol. 

6) Performance: Let u denote the hard decision output. The performance is measured by the 

symbol-error-rate (SER) SER = Pr{f/ f U} — 4 Pr{R f U\U = u}. Define e = uQu, 

where 0 denotes the subtraction under modulo-g operation. Due to the existence of the random 
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vector w, the peformance is irrelevant to the transmitted symbol u. We define 

x\\'p( w )—'p( e ® w )\\ 2 (j'j 

q 

w&M 

as the average Euclidian distance enumerating function (EDEF) corresponding to the error e, 
where X is a dummy variable. Then, the average EDEF D (N ' ) (X) for the R code C[N, 1] over 
all possible messages u and all possible vectors w can be computed as 


R (JV) (X) 

iV-1 

E l J2 \\v{u®Wj)-if{u®e®Wj)\\ 2 

-X j=° 

q 

u&M H 

= Y (De(X)) N 4 Y B ( 5 n) X s \ (8) 

eG M <5 

where denotes the average number of signal pairs (s,s) with Euclidean distance 6, s = 
(<p(u © w 0 ), ■ ■ ■ , p(u © wn-i)) and s = (ip(u © w 0 ), ■ ■ ■ , ip(u © wn-i))- The performance un¬ 
der the mapping 99 can be upper-bounded by the union bound as 

SER = /^(SNR) < Y Bf >Q , (9) 

where Q (^) is the pair-wise error probability with Q (x) = /, + °°^ ex p(-T) dz - 

From the above derivation, we can see that the performance bounds of the R codes are 
related to the mapping 99 . In this paper, we consider as examples the BPSK, the signal set 
{—1,0, +1} (denoted as 3-PAM), 4-PAM, 8 -ary phase-shift keying ( 8 -PSK) modulation, 16-ary 
quadrature amplitude modulation (16-QAM), or 16-PAM, which are depicted in Fig. 2 along with 
mappings denoted by tp 0 , • • • , <^9 as specified in the figure. Fig. 3 and Fig. 4 show performance 
bounds for several R codes defined with the considered constellations. From the figures, we have 
the following observations. 

1) The performance gap between the code C[N, 1] and the uncoded transmission, when 
measured by the SNR instead of E b /N 0 , is roughly 10 log 10 (iV) dB. 

2) Given a signal constellation, mappings that are universally good for all R codes may not 
exist. For example, as shown in Fig. 4, (p 2 is better than ip 3 for rate 1/63 (N = 63) but 
becomes worse for rate 1/7 (N = 7). 
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Fig. 2. Examples of signal constellations and mappings. 


C. Time-Sharing 

With repetition codes over groups, we are able to implement code rates jr for any given 
integer N > 1. To implement other code rates, we turn to the time-sharing technique. To be 
precise, let R — ^ be the target rate. There must exist a unique N > 1 such that ^ < jj. 
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- ♦ - Shannon limit, rate 239/255 
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Fig. 3. Performances and bounds of RUN codes. The “rate” in the legend of this figure (or other similar figures in this paper) 
refers to the code rate. A rate-i? code over a g-ary constellation has a spectral efficiency of J Rlog 2 (g) in bits per symbol, at 
which the Shannon limit is determined. 
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Fig. 4. Performances and bounds of R codes with 4-PAM under different mappings. 


Then, we can implement a code by time-sharing between the code C[N + 1,1] and the code 
C[N, 1], which is equivalent to encoding aP information symbols with the code C[N + 1,1] and 
the remaining (1 — a)P symbols with the code C[N, 1], where a = — N is the time-sharing 

factor. Apparently, to construct codes with rate R > |, we need time-sharing between the code 
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C[2, 1] and the uncoded transmission. For this reason, we call this class of codes as RUN codes, 
which consist of the R codes and codes obtained by time-sharing between the R codes and/or 
the uncoded transmission. We denote a RUN code of rate ^ as Crun[<3, P]- Replacing in Fig. 1 
the R codes with the RUN codes, we then have a coding system that can transmit messages with 
any given code rate over any given signal set. 

1) Encoding: Let u 6 M p be the message sequence. The encoder of the code Crun[Q,-P] 
encodes the left-most aP symbols of u into aP codewords of C[N + 1,1] and the remaining 
symbols into (1 — a)P codewords of C[N, 1], 

2) Decoding: The decoding is equivalent to decoding separately aP codewords of C[A r + 1,1] 
and (1 — a)P codewords of C[N, 1]. 

3) Complexity: Both the encoding/mapping and the demapping/decoding have the same com¬ 
plexity as the R codes. 

4) Performance: The performance of the RUN code of rate R — ^ is given by 

SER = a • f v>N+1 (SNR) + (1 - a) • f v , N (SNR), (10) 

which can be upper-bounded with the aid of (9). Performances and bounds of several RUN 
codes defined with BPSK modulation, 3-PAM, 4-PAM, 8-PSK modulation, or 16-QAM are 
shown in Fig. 3 and Fig. 4. We notice that the union bounds with BPSK modulation are the 
exact performances, while those with other signal sets are upper bounds to the performances. We 
also notice that the upper bounds become tight as the SER is lower than KT 2 for all other signal 
sets. Not surprisingly, the performances of the RUN codes are far away from the corresponding 
Shannon limits (more than 5 dB) at the SER lower than 10~ 2 . 

IV. BMST over Groups 
A. BMST Codes with RUN Codes As Basic Codes 

We have constructed a class of codes called RUN codes with any given code rate over groups. 
However, the RUN codes perform far away from the Shannon limits, as evidenced by the 
examples in Fig. 3 and Fig. 4. To remedy this, we transmit the RUN codes in the BMST 
manner as inspired by the fact that, as pointed out in [9], any short code can be embedded 
into the BMST system to obtain extra coding gain in the low error-rate region. The resulted 
codes are referred to as BMST-RUN codes. More precisely, we use the //-fold Cartesian product 
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Fig. 5. Encoding structure of a BMST-RUN code with memory m. 


of the RUN code Crun[<3,-P] (denoted as Crun[<3, P] b ) as the basic code. Fig. 5 shows the 


encoding structure of a BMST-RUN code with memory m, where | RUN represents the basic 

represents m symbol-wise interleavers, [+] represents the superposition 


encoder, II i 


n„ 


with modulo-g addition, and Up\ represents the mapping (p. Let u {i:) G Ai PB and v® G A4 QB 
be the information sequence and the corresponding codeword of the code Crxjn[Q,-P] B at time 
t, respectively. Then, the sub-codeword c® can be expressed as 


c (t) = v (t) 


w 


(t, i) 


w 


( t,m ) 


( 11 ) 


where © denotes the symbol-wise modulo-g addition, = 0 G Ai QB for t < 0 and is 
the interleaved version of v <t ^ l> by the /-1h interleaver 11, for ?' = !,•••, m. Then, c® is mapped 
to the signal vector G A ( - B symbol-by-symbol and input to the channel. After every L sub¬ 
blocks of information sequence, we terminate the encoding by inputting m all-zero sequences 
■u© = 0 G M PB (L < t < L + m — 1) to the encoder. The termination will cause a code rate 
loss. However, the rate loss can be negligible as L is large enough. 


B. Choice of Encoding Memory 

The critical parameter for BMST-RUN codes to approach the Shannon limits at a given target 
SER is the encoding memory m, which can be determined by the genie-aided lower bound. 
Essentially the same as for the binary BMST codes [9], the genie-aided bound for a BMST- 
RUN code can be easily derived by assuming all but one sub-blocks {u™, 0<i<L — l,i f t} 
are known at the receiver. With this assumption, the genie-aided system becomes an equivalent 
system that transmits the basic RUN codeword m + 1 times. Hence the performance of the 
genie-aided system is the same as the RUN code obtained by time-sharing between the code 
C[(N + l)(m + 1), 1] and the code C[N{;m + 1), 1]. As a result, the genie-aided bound under a 
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Fig. 6. The unified (high-level) normal graph of a BMST-RUN code with L = 4 and m = 2. 


mapping ip is given by 


SER = / B MST-RUN(SNR,m) > / genie (SNR, m) 

Cf' f (SNR) + (1 a) ■/<*>, JV(mFl) (SNR), 


( 12 ) 


which can be approximated using the union bound in the high SNR region. 

Given a signal set A of size q with labeling <p, a rate R = P/Q and a target SER p target , we 
can construct a good BMST-RUN code using the following steps. 

1) Construct the code Cr.un[Q, P] b over the modulo-g group by finding N such that < 

and determining the time-sharing factor a between the R code [N + 1,1] and the 
R code [N, 1]. To approach the Shannon limit and to avoid error propagation, we usually 
choose B such that QB > 1000. 

2) Find the Shannon limit 7 i im under the signal set A corresponding to the rate R. 

3) Find an encoding memory m such that m is the minimum integer satisfying / ge me(7iim, rn ) < 

^target • 

4) Generate m interleavers of size QB uniformly at random. 


C. Decoding of BMST-RUN Codes 

A BMST-RUN code can be decoded by an SWD algorithm with a decoding delay d over its 
normal graph, which is similar to that of the binary BMST codes [9]. Fig. 6 shows the unified 
(high-level) normal graph of a BMST-RUN code with L = 4 and rn — 2. The normal graph can 
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also be divided into layers , each of which consists of four types of nodes. These nodes represent 
similar constraints to those for binary BMST codes and have similar message processing as 
outlined below. 

• The process at the node 


RUN 


is the SISO decoding of the RUN codes as described in 


Section III-B. 

The process at the node | = | can be implemented in the same way as the message processing 
at a generic variable node of an LDPC code (binary or non-binary). 

The process at the node [+] can be implemented in the same way as the message processing 
at a generic check node of an LDPC code (binary or non-binary). 


• The process at the node |_nj is the same as the original one, which interleaves or deinterleaves 
the input messages. 

Upon the arrival of the received vector y® (corresponding to the sub-block c (L> ) at time t, 
the SWD algorithm takes as inputs the a posterior probabilities (APPs) corresponding to 
and uses the APPs corresponding to C (t ~ d \ ■ ■ ■ , C {V) to recover u (i d> , where the computation 
of APPs is similar to (4). After u lt ^ d ' } is recovered, the decoder discards y id d> and slides one 
layer of the normal graph to the “right” to recover u^~ d+1) with y lL+1 ■' received. 


V. Examples of BMST-RUN Codes 

In this section, we present simulation results of several BMST-RUN codes over AWGN 
channels and Rayleigh flat fading channels, where code parameters are shown in Table I. For 
all simulations, the encoder terminates every L = 1000 sub-blocks and the decoder executes the 
SWD algorithm with a maximum iteration number 18. Without specification, the decoding delay 
d of the SWD algorithm is set to be 3 m. 


A. BMST-RUN Codes with One-Dimensional Signal Sets 

Consider BMST-RUN codes of rates = 1, • • • , 7) defined with BPSK modulation to 

approach the Shannon limits at the SER of 10~ 5 . Fig. 7 shows the required SNRs for the 
BMST-RUN codes to achieve the SER of 10~ 5 . Also shown in Fig. 7 is the channel capacity 
curve with i.u.d. inputs. It can be seen that the gaps between the required SNRs and the Shannon 
limits are within 1 dB for all considered rates. 




TABLE I 

Construction Examples of BMST-RUN Codes over AWGN Channels 
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* The mappings in this table are the same as those specified in Fig. 2. Notice that the shaping gain of the non-uniformly 
spaced 16-PAM is about 0.5 dB. 


Consider BMST-RUN codes of rates y(iT = 1,- • -,6) defined with 3-PAM to approach the 
Shannon limits at the SER of 10~ 4 . Fig. 8 shows the SER performance curves for all codes 
together with their lower bounds and the corresponding Shannon limits. We can see that the 
performance curves match well with the corresponding lower bounds for all codes in the high 
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SNR (dB) 

Fig. 7. The required SNRs to achieve the SER of 10“ 5 for the BMST-RUN codes with the codes Crun[Q, T > ] 1250 (^ = 
• , |) as basic codes defined with BPSK modulation. 
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Fig. 8. Performances of the BMST-RUN codes with the codes Crun[<3, P] 300 
3-PAM. 


(£. = 


as basic codes defined with 


SNR region. In addition, all codes have an SER lower than 10~ 4 at the SNR within 1 dB away 
from the corresponding Shannon limits, which is similar to the BPSK modulation case. 

Consider a rate-| BMST-RUN code of memory 5 defined over two distinct 16-PAM constel¬ 
lations, where one consists of uniformly spaced signal points (under the mapping tp 8 in Fig. 2) 
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Fig. 9. Comparison of the BMST-RUN code with the code Crun[ 2, l] 250 as the basic code defined with two distinct 16-PAM 
constellations under the mapping ip$ and 959 in Fig. 2. 


and the other consists of non-uniformly spaced signal points (under the mapping <p 9 in Fig. 2) as 
designed in [18]. The SER performance curves with a decoding delay d — 15 together with the 
lower bounds and the Shannon limits are shown in Fig. 9. From the figure, we can see that the 
BMST-RUN code has an SER lower than 10 -3 at the SNR about 1.0 away from their respective 
Shannon limits for both uniformly spaced signal points and non-uniformly spaced signal points. 
In addition, the BMST-RUN code with non-uniformly spaced signal points performs about 0.5 dB 
better than that with uniformly spaced signal points and also has a lower error floor. 

B. BMST-RUN Codes with Two-Dimensional Signal Sets 

Consider BMST-RUN codes of rates y (K = 1,- - -,4) defined with 8 -PSK modulation to 
approach the Shannon limits at the SER of 10~ 4 . Fig. 10 shows the SER performance curves 
for all codes together with their lower bounds and the corresponding Shannon limits. 

Consider a BMST-RUN code of rate ||| defined with 16-QAM (under the mapping 927 in 
Fig. 2) to approach the Shannon limit at the SER of 10 -3 , where an encoding memory m = 2 
is required. The SER performance curves with decoding delays d = 6 and 20 together with the 
lower bound and the Shannon limit are shown in Fig. 11. Since a large fraction of information 
symbols (|||) are uncoded in the basic code, a large decoding delay d = 10 m = 20 is required 
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Fig. 10. Performances of the BMST-RUN codes with the codes (^un[Q,-P] 150 (^ = as basic codes defined with 8-PSK 

modulation. 



Fig. 11. Performance of the BMST-RUN code with the code Crun[255, 239] 4 as the basic code defined with 16-QAM, where 
the mapping is (p 7 in Fig. 2. 


to approach the lower bound. With the decoding delay d = 20, the BMST-RUN code achieves 
the SER of 10” 3 at the SNR about 1 dB away from the Shannon limit. 

From the above two examples, we can see that BMST codes with two-dimensional signal 
constellations behave similarly as they do with one-dimensional signal constellations. 
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Fig. 12. Performance of the BMST-RUN codes with the codes Chun [7, K] 200 (K = 1,- - -, 6 ) over the modulo-4 group and 
the BMST-BICM scheme with the codes Crun[7, A'] 400 (A' = 1,- --, 6 ) over F 2 as basic codes, where both schemes are under 
4-PAM with the mapping 933 in Fig. 2. 



Fig. 13. The required SNRs to achieve the BER of 10 -4 over AWGN channels for the BMST-RUN codes with the codes 
Crun[7, A"] 200 (A' = 1,- • -,6) over the modulo-4 group and the BMST-BICM scheme with the codes Crun[7, K] 400 (K = 1,- • -,6) 
over F 2 as basic codes, where both schemes are under 4-PAM with the mapping <£>3 in Fig. 2. 


C. Comparison with BMST-BICM 

The examples in the previous subsections suggest that the proposed construction is effective 
for a wide range of code rates and signal sets. Also, the SWD algorithm is near-optimal in the 
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high SNR region. Since binary BMST codes also have such behaviors and can be combined 
with different signal sets [14], we need clarify the advantage of BMST-RUN codes over groups. 
Some advantages have been mentioned in the Introduction. In this subsection, we will show that 
the BMST-RUN codes can perform better than the BMST-BICM scheme. 

To make a fair comparison, we have the following settings. 

• For the BMST-BICM scheme, the basic codes are the RUN codes [7 ,K] 400 (K = 1,- • -,6) 
over F 2 , while for the BMST-RUN codes, the basic codes are the RUN codes [7, K] 200 (K = 
1,- • -,6) over the modulo-4 group. Such setting ensures that both schemes have the same 
sub-block length 2800 in bits. 

• Both the BMST-RUN codes and the BMST-BICM scheme use the 4-PAM with the mapping 
<P3 in Fig- 2 - 

• For a specific code rate, the BMST-BICM scheme has the same encoding memory and the 
same decoding delay as the BMST-RUN code. The encoding memories are presented in 
Table I, while the decoding delay is set to be 3 m for an encoding memory m. 

Since the performance of the BMST-BICM scheme can not be measured in SER, we compare 
the performance in BER. Fig. 12 shows the BER performance curves for both the BMST-RUN 
codes (denoted as “RUN”) and the BMST-BICM scheme (denoted as “BICM”) together with 
the Shannon limits. Fig. 13 shows the required SNRs to achieve the BER of 10 -4 for both the 
BMST-RUN codes and the BMST-BICM scheme together with capacity curve of 4-PAM under 
i.u.d. inputs. From these two figures, we have the following observations. 

• With the same encoding memory and decoding delay, the BMST-RUN codes achieve a 
lower BER than the BMST-BICM scheme for all considered code rates. 

• The BMST-RUN codes perform better than the BMST-BICM scheme in the lower code rate 
region and have a similar performance as the BMST-BICM scheme in the high code rate 
region. 

D. BMST-RUN Codes over Rayleigh Channels 

It has been shown that BMST-RUN codes perform well over AWGN channels and are compa¬ 
rable to binary BMST codes with BICM. More interestingly and importantly, BMST construction 
is also applicable to other ergodic channels. Here, we give an example for fading channels as 
an evidence. 
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SNR (dB) 

Fig. 14. The required SNRs to achieve the SER of 10 -4 for the BMST-RUN codes with the codes Crun[Q, T > ] 200 (^ = 
i, ■ • • , |) as basic codes defined with 4-PAM modulation (under the mapping 3 in Fig. 2) over Rayleigh flat fading channels. 


Consider BMST-RUN codes of rates y [K — 1, • • • , 6) defined with 4-PAM modulation (under 
the mapping tp 3 in Fig. 2) over Rayleigh flat fading channels. To approach the Shannon limits 
at the SER of 10 -4 , the required encoding memories for rates 1. and | are 7, 7, 6, 7, 5, 

and 4, respectively. Fig. 14 shows the required SNRs for the BMST-RUN codes to achieve the 
SER of 1CT 4 . Also shown in Fig. 14 is the channel capacity curve with i.u.d. inputs. It can be 
seen that the gaps between the required SNRs and the Shannon limits are about 1 dB for all 
rates, which is similar to the case for AWGN channels. 

VI. Conclusions 

In this paper, by combining the block Markov superposition transmission (BMST) with the 
RUN codes over groups, we have proposed a simple scheme called BMST-RUN codes to 
approach the Shannon limits at any target symbol-error-rate (SER) with any given (rational) 
rate over any alphabet (of moderate size). We have also derived the genie-aided lower bound 
for the BMST-RUN codes. Simulation results have shown that the BMST-RUN codes have a 
similar behavior to the binary BMST codes and have good performance for a wide range of code 
rates over both AWGN channels and Rayleigh flat fading channels. Compared with the BMST 
with bit-interleaved coded modulation (BMST-BICM) scheme, the BMST-RUN codes are more 
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flexible, which can be combined with signal sets of any size. In addition, with the same encoding 
memory, the BMST-RUN codes have a better performance than the BMST-BICM scheme under 
the same decoding latency. 
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